Restless Bandits with Constrained Arms: Applications in Social and Information Networks
نویسندگان
چکیده
We study a problem of information gathering in a social network with dynamically available sources and time varying quality of information. We formulate this problem as a restless multi-armed bandit (RMAB). In this problem, information quality of a source corresponds to the state of an arm in RMAB. The decision making agent does not know the quality of information from sources a priori. But the agent maintains a belief about the quality of information from each source. This is a problem of RMAB with partially observable states. The objective of the agent is to gather relevant information efficiently from sources by contacting them. We formulate this as a infinite horizon discounted reward problem, where reward depends on quality of information. We study Whittle’s index policy which determines the sequence of play of arms that maximizes long term cumulative reward. We illustrate the performance of index policy, myopic policy and compare with uniform random policy through numerical simulation.
منابع مشابه
Lazy Restless Bandits for Decision Making with Limited Observation Capability: Applications in Wireless Networks
In this work we formulate the problem of restless multi-armed bandits with cumulative feedback and partially observable states. We call these bandits as lazy restless bandits (LRB) as they are slow in action and allow multiple system state transitions during every decision interval. Rewards for each action are state dependent. The states of arms are hidden from the decision maker. The goal of t...
متن کاملMulti-armed Bandits with Constrained Arms and Hidden States
The problem of rested and restless multi-armed bandits with constrained availability of arms is considered. The states of arms evolve in Markovian manner and the exact states are hidden from the decision maker. First, some structural results on value functions are claimed. Following these results, the optimal policy turns out to be a threshold policy. Further, indexability of rested bandits is ...
متن کاملLeveraging Side Observations in Stochastic Bandits
This paper considers stochastic bandits with side observations, a model that accounts for both the exploration/exploitation dilemma and relationships between arms. In this setting, after pulling an arm i, the decision maker also observes the rewards for some other actions related to i. We will see that this model is suited to content recommendation in social networks, where users’ reactions may...
متن کاملInterdependent Security Game Design over Constrained Linear Influence Networks
In today's highly interconnected networks, security of the entities are often interdependent. This means security decisions of the agents are not only influenced by their own costs and constraints, but also are affected by their neighbors’ decisions. Game theory provides a rich set of tools to analyze such influence networks. In the game model, players try to maximize their utilities through se...
متن کاملMarginal productivity index policies for scheduling restless bandits with switching penalties
We address the dynamic scheduling problem for discrete-state restless bandits, where sequence-independent setup penalties (costs or delays) are incurred when starting work on a project. We reformulate such problems as restless bandit problems without setup penalties, and then deploy the theory of marginal productivity indices (MPIs) and partial conservation laws (PCLs) we have introduced and de...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1801.03634 شماره
صفحات -
تاریخ انتشار 2018